Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major Refactor - Faster, Robust Data Schemas #195

Merged
merged 20 commits into from
Nov 25, 2024
Merged

Conversation

devsjc
Copy link
Collaborator

@devsjc devsjc commented Nov 22, 2024

Modifies the core logic to be focussed around the downloading of a single int time as quickly as possible, processing required files in parallel.

Highlights:

  • Reduces ram usage by 10x
  • Faster processing even without concurrency
  • Enables writing to stores in parallel
  • Defines schema for produced data for consistency
  • Builds logic around DataArrays instead of Datasets
  • Better adherence to hexagonal architecture in structure

This introduces many breaking changes to prior usage:

  • Produced data is of a different shape with differently named components
  • New/renamed environment variables
  • CLI usage changed (to be simpler)
  • ECMWF MARS support gone (for the time being!)

@peterdudfield
Copy link
Contributor

peterdudfield commented Nov 25, 2024

different NWP provider, need to check

  • India ECMWF
  • India MO Global
  • India GFS
  • UK metoffice UKV
  • UK ECMWF

Forecasts

  • India Forecast app
  • UK PVnet Intraday
  • UK PVnet DA
  • UK site
  • UK national_xg

@peterdudfield
Copy link
Contributor

Add github issue for unit tests for provider test files

Copy link
Contributor

@peterdudfield peterdudfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill add comments later

@@ -0,0 +1,501 @@
"""Domain classes for store metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think of renaming ....

@devsjc devsjc merged commit 1cbf51c into main Nov 25, 2024
4 checks passed
@devsjc devsjc changed the title Major Refactor - Modify core logic to be faster Major Refactor - Faster, Robust Data Schemas Nov 25, 2024
@devsjc devsjc deleted the devsjc/major-refactor branch November 25, 2024 12:10
"the area of the grid square covered by any cloud "
"to the square's total area.",
units="UI",
limits=ParameterLimits(upper=1, lower=0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are any in percentage? i thnk ive seen cloud cover go from 0-100 before, but perhaps you've convered this

"""
store_range: str = coords.init_time[0].strftime("%Y%m%d%H")
if len(coords.init_time) > 1:
store_range = f"{coords.init_time[0]:%Y%m%d%H}-{coords.init_time[-1]:%Y%m%d%H}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you do store_range+=f"-{coords.init_time[-1]:%Y%m%d%H}" to shortened the code

@@ -0,0 +1,17 @@
"""Model Repositories

TODO: Add description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add

@@ -0,0 +1,148 @@
"""Implementation of the NWP consumer services."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worht saying, this is for collecting live NWPs?

self.nr = notification_repository

@override
def consume(self, it: dt.datetime | None = None) -> ResultE[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it mean?

np.arange(45, 135, 0.234),
np.arange(135, 225, 0.234),
np.arange(225, 315, 0.234),
])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could these be moved out to the top of this file?

Args:
filename: The name of the file.
it: The init time of the model run.
max_step: The maximum step in hours to consider.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it in general should change to init_time, not to many more characters and its slightly easier to read


Data is provided on a per-order basis, so the filestructure depends on the order ID.
For OCF's purposes, on file per parameter per step is requested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weoth mentioning env vars up here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants